Goto

Collaborating Authors

 Andrews County


A Multimodal Conversational Agent for Tabular Data Analysis

Awad, Mohammad Nour Al, Ivanov, Sergey, Tikhonova, Olga, Khodnenko, Ivan

arXiv.org Artificial Intelligence

Abstract--Large language models (LLMs) can reshape information processing by handling data analysis, visualization, and interpretation in an interactive, context-aware dialogue with users, including voice interaction, while maintaining high performance. The system lets users query datasets with voice or text instructions and receive answers as plots, tables, statistics, or spoken explanations. Built on LLMs, the suggested design combines OpenAI Whisper automatic speech recognition (ASR) system, Qwen-coder code generation LLM/model, custom sandboxed execution tools, and Coqui library for text-to-speech (TTS) within an agentic orchestration loop. Unlike text-only analysis tools, it adapts responses across modalities and supports multi-turn dialogues grounded in dataset context. In an evaluation of 48 tasks on three datasets, our prototype achieved 95.8% accuracy with model-only generation time under 1.7 seconds (excluding ASR and execution time). A comparison across five LLM sizes (1.5B-32B) revealed accuracy-latency-cost trade-offs, with a 7B model providing the best balance for interactive use. By routing between conversation with user and code execution, constrained to a transparent sandbox, with simultaneously grounding prompts in schema-level context, the T alk2Data agent reliably retrieves actionable insights from tables while making computations verifiable. In the article, except for the T alk2Data agent itself, we discuss implications for human-data interaction, trust in LLM-driven analytics, and future extensions toward large-scale multimodal assistants. Interacting with data often requires programming skills or statistical expertise, creating barriers for managers, analysts, and other non-technical users [1], [2]. Natural language interfaces (NLIs) aim to improve this information seeking process by allowing users to query data conversationally [3], [4]. At the same time, voice interfaces are becoming increasingly common in daily life, yet existing voice assistants remain limited: they can answer factual questions or control devices, but they lack the analytical capabilities needed for meaningful data exploration.


A Survey of Dimension Estimation Methods

Binnie, James A. D., Dłotko, Paweł, Harvey, John, Malinowski, Jakub, Yim, Ka Man

arXiv.org Artificial Intelligence

It is a standard assumption that datasets in high dimension have an internal structure which means that they in fact lie on, or near, subsets of a lower dimension. In many instances it is important to understand the real dimension of the data, hence the complexity of the dataset at hand. A great variety of dimension estimators have been developed to find the intrinsic dimension of the data but there is little guidance on how to reliably use these estimators. This survey reviews a wide range of dimension estimation methods, categorising them by the geometric information they exploit: tangential estimators which detect a local affine structure; parametric estimators which rely on dimension-dependent probability distributions; and estimators which use topological or metric invariants. The paper evaluates the performance of these methods, as well as investigating varying responses to curvature and noise. Key issues addressed include robustness to hyperparameter selection, sample size requirements, accuracy in high dimensions, precision, and performance on non-linear geometries. In identifying the best hyperparameters for benchmark datasets, overfitting is frequent, indicating that many estimators may not generalise well beyond the datasets on which they have been tested.


InfoMAE: Pair-Efficient Cross-Modal Alignment for Multimodal Time-Series Sensing Signals

Kimura, Tomoyoshi, Li, Xinlin, Hanna, Osama, Chen, Yatong, Chen, Yizhuo, Kara, Denizhan, Wang, Tianshi, Li, Jinyang, Ouyang, Xiaomin, Liu, Shengzhong, Srivastava, Mani, Diggavi, Suhas, Abdelzaher, Tarek

arXiv.org Artificial Intelligence

Standard multimodal self-supervised learning (SSL) algorithms regard cross-modal synchronization as implicit supervisory labels during pretraining, thus posing high requirements on the scale and quality of multimodal samples. These constraints significantly limit the performance of sensing intelligence in IoT applications, as the heterogeneity and the non-interpretability of time-series signals result in abundant unimodal data but scarce high-quality multimodal pairs. This paper proposes InfoMAE, a cross-modal alignment framework that tackles the challenge of multimodal pair efficiency under the SSL setting by facilitating efficient cross-modal alignment of pretrained unimodal representations. InfoMAE achieves \textit{efficient cross-modal alignment} with \textit{limited data pairs} through a novel information theory-inspired formulation that simultaneously addresses distribution-level and instance-level alignment. Extensive experiments on two real-world IoT applications are performed to evaluate InfoMAE's pairing efficiency to bridge pretrained unimodal models into a cohesive joint multimodal model. InfoMAE enhances downstream multimodal tasks by over 60% with significantly improved multimodal pairing efficiency. It also improves unimodal task accuracy by an average of 22%.


Riemannian Geometry for the classification of brain states with intracortical brain-computer interfaces

Marin-Llobet, Arnau, Manasanch, Arnau, Sanchez-Manso, Sergio, Tresserras, Lluc, Zhang, Xinhe, Hua, Yining, Zhao, Hao, Torao-Angosto, Melody, Sanchez-Vives, Maria V, Porta, Leonardo Dalla

arXiv.org Artificial Intelligence

This study investigates the application of Riemannian geometry - based methods for brain decoding using invasive electrophysiological recordings. Although previously employed in non - invasive, the utility of Riemannian geometry for invasive datasets, which ar e typically smaller and scarcer, remains less explored. Here, we propose a Minimum Distance to Mean (MDM) classifier using a Riemannian geometry approach based on covariance matrices extracted from intracortical Local Field Potential (LFP) recordings acros s various regions during different brain state dynamics. For benchmarking, we evaluated the performance of our approach against C onvolutional N eural N etworks (CNNs) and Euclidean MDM classifiers. Our results indicate that the Riemannian geometry - based classification not only achieves a superior mean F1 macro - averaged score across different channel configurations but also requires up to two orders of mag nitude less computational training time. Additionally, the geometric framework reveals distinct spatial co ntributions of brain regions across varying brain states, suggesting a state - dependent organization that traditional time series - based methods often fail to capture. Our findings align with previous studies supporting the efficacy of geometry - based methods and extending their application to invasive brain recordings, highlighting their potential for broader clinical use, such as brain computer interface applications.


Inroads to a Structured Data Natural Language Bijection and the role of LLM annotation

Vente, Blake

arXiv.org Artificial Intelligence

This work finds limited evidence supporting the theory that using multiple tasks with sequence-to-sequence transformer language models can improve performance on some metrics. In particular, the multi-task generalist t5-small outperforms the specialist t5-small with a $F_1$ of $0.771$ up from $0.692$, which may point to underlying cross-task knowledge generalization. This further suggests that even with the same network, "re-using" the same data in a different way may lead to higher performance in some metrics. However, the inverse task alone is likely only an optimization strategy, since it does not yield a significant general improvement at the model sizes explored in this work. Also, adding $\approx 4500$ LLM annotated records (interlaced with the $12800$ WebNLG training records) does not substantially change automatic metric performance compared to the same t5-small model without the synthetic data. This may be due to a learning capacity bottleneck on account of model size, and decreases observed may be due to distributional differences in the corpora. Future research using larger models or human evaluation is required to more fully explain the mechanisms contributing to performance on these tasks.


Gates Are Not What You Need in RNNs

Zakovskis, Ronalds, Draguns, Andis, Gaile, Eliza, Ozolins, Emils, Freivalds, Karlis

arXiv.org Artificial Intelligence

Recurrent neural networks have flourished in many areas. Consequently, we can see new RNN cells being developed continuously, usually by creating or using gates in a new, original way. But what if we told you that gates in RNNs are redundant? In this paper, we propose a new recurrent cell called Residual Recurrent Unit (RRU) which beats traditional cells and does not employ a single gate. It is based on the residual shortcut connection, linear transformations, ReLU, and normalization. To evaluate our cell's effectiveness, we compare its performance against the widely-used GRU and LSTM cells and the recently proposed Mogrifier LSTM on several tasks including, polyphonic music modeling, language modeling, and sentiment analysis. Our experiments show that RRU outperforms the traditional gated units on most of these tasks. Also, it has better robustness to parameter selection, allowing immediate application in new tasks without much tuning.


Robot Control based on Motor Primitives -- A Comparison of Two Approaches

Nah, Moses C., Lachner, Johannes, Hogan, Neville

arXiv.org Artificial Intelligence

Motor primitives are fundamental building blocks of a controller which enable dynamic robot behavior with minimal high-level intervention. By treating motor primitives as basic "modules," different modules can be sequenced or superimposed to generate a rich repertoire of motor behavior. In robotics, two distinct approaches have been proposed: Dynamic Movement Primitives (DMPs) and Elementary Dynamic Actions (EDAs). While both approaches instantiate similar ideas, significant differences also exist. This paper attempts to clarify the distinction and provide a unifying view by delineating the similarities and differences between DMPs and EDAs. We provide eight robot control examples, including sequencing or superimposing movements, managing kinematic redundancy and singularity, obstacle avoidance, and managing physical interaction. We show that the two approaches clearly diverge in their implementation. We also discuss how DMPs and EDAs might be combined to get the best of both approaches. With this detailed comparison, we enable researchers to make informed decisions to select the most suitable approach for specific robot tasks and applications.


Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey

She, Xinyu, Liu, Yue, Zhao, Yanjie, He, Yiling, Li, Li, Tantithamthavorn, Chakkrit, Qin, Zhan, Wang, Haoyu

arXiv.org Artificial Intelligence

Modern language models (LMs) have been successfully employed in source code generation and understanding, leading to a significant increase in research focused on learning-based code intelligence, such as automated bug repair, and test case generation. Despite their great potential, language models for code intelligence (LM4Code) are susceptible to potential pitfalls, which hinder realistic performance and further impact their reliability and applicability in real-world deployment. Such challenges drive the need for a comprehensive understanding - not just identifying these issues but delving into their possible implications and existing solutions to build more reliable language models tailored to code intelligence. Based on a well-defined systematic research approach, we conducted an extensive literature review to uncover the pitfalls inherent in LM4Code. Finally, 67 primary studies from top-tier venues have been identified. After carefully examining these studies, we designed a taxonomy of pitfalls in LM4Code research and conducted a systematic study to summarize the issues, implications, current solutions, and challenges of different pitfalls for LM4Code systems. We developed a comprehensive classification scheme that dissects pitfalls across four crucial aspects: data collection and labeling, system design and learning, performance evaluation, and deployment and maintenance. Through this study, we aim to provide a roadmap for researchers and practitioners, facilitating their understanding and utilization of LM4Code in reliable and trustworthy ways.


Parmesan: mathematical concept extraction for education

Collard, Jacob, de Paiva, Valeria, Subrahmanian, Eswaran

arXiv.org Artificial Intelligence

Mathematics is a highly specialized domain with its own unique set of challenges that has seen limited study in natural language processing. However, mathematics is used in a wide variety of fields and multidisciplinary research in many different domains often relies on an understanding of mathematical concepts. To aid researchers coming from other fields, we develop a prototype system for searching for and defining mathematical concepts in context, focusing on the field of category theory. This system, Parmesan, depends on natural language processing components including concept extraction, relation extraction, definition extraction, and entity linking. In developing this system, we show that existing techniques cannot be applied directly to the category theory domain, and suggest hybrid techniques that do perform well, though we expect the system to evolve over time. We also provide two cleaned mathematical corpora that power the prototype system, which are based on journal articles and wiki pages, respectively. The corpora have been annotated with dependency trees, lemmas, and part-of-speech tags.


Comparing Reinforcement Learning and Human Learning using the Game of Hidden Rules

Pulick, Eric, Menkov, Vladimir, Mintz, Yonatan, Kantor, Paul, Bier, Vicki

arXiv.org Artificial Intelligence

Reliable real-world deployment of reinforcement learning (RL) methods requires a nuanced understanding of their strengths and weaknesses and how they compare to those of humans. Human-machine systems are becoming more prevalent and the design of these systems relies on a task-oriented understanding of both human learning (HL) and RL. Thus, an important line of research is characterizing how the structure of a learning task affects learning performance. While increasingly complex benchmark environments have led to improved RL capabilities, such environments are difficult to use for the dedicated study of task structure. To address this challenge we present a learning environment built to support rigorous study of the impact of task structure on HL and RL. We demonstrate the environment's utility for such study through example experiments in task structure that show performance differences between humans and RL algorithms.